Text Analysis
for
Clinical Research

Our current experiences at UBEP

ITALY/UniPD/DSCTVPH/UBEP/LAIMS/CorradoLanera.phd

Tools for Text Analysis

Neural Network

RNN

Transformer

Applications for Text Analysis

(L)LM

Opportunities for Text Analysis

UBEP on Text Analysis

Project Data Records Tool Test
Extending SRs citation set 7494
(14 SR)
RF/SVM 0.934-0.999
(AUC-ROC)
VZV detection Pediatrician
free-text notes
60659 RNN 0.953
(AUC-ROC)
Otitis Classification Pediatrician
free-text notes
297673 RNN 0.955
(Balance F1)
Injuries Classification ED discharge
free-text notes
8194 GPT-4 99.5-1.0
(Accuracy)
SR Screening citation set 3080 Humans + ASReview 0.96-0.98
(AUC-ROC)
SR Screening citation set 24931 GPT-4o fine-tuning
SR Screening citation set 535+ GPT-4o developing
Classification,
Extraction & Matching
citation & registries set 594587 GPT-4o running

Comparisons of Text Analysis1

Model Type Access Computation Accuracy
(n = 320)
GPT-4o pretrained payed remote TBD
GPT-3.5-turbo pretrained payed remote TBD
LLama3 70B pretrained free local 98.6 (97.6 balanced)
LLama3 8B pretrained free local 92.2 (90.8 balanced)
Gemma 1.1 7B pretrained free local 79.1 (75.7 balanced)
RF fitted self-developed local 77.6
GBM fitted self-developed local 62.8
NB fitted self-developed local 60.6
SVM fitted self-developed local 57.8

Reality for Text Analysis

References

Lanera, C., P. Berchialla, A. Sharma, C. Minto, D. Gregori, and I. Baldi. 2019. “Screening PubMed Abstracts: Is Class Imbalance Always a Challenge to Machine Learning?” Systematic Reviews 8 (1). https://doi.org/10.1186/s13643-019-1245-8.
Lanera, Corrado, Ileana Baldi, Andrea Francavilla, Elisa Barbieri, Lara Tramontan, Antonio Scamarcia, Luigi Cantarutti, Carlo Giaquinto, and Dario Gregori. 2022. “A Deep Learning Approach to Estimate the Incidence of Infectious Disease Cases for Routinely Collected Ambulatory Records: The Example of Varicella-Zoster.” International Journal of Environmental Research and Public Health 19 (10): 5959. https://doi.org/10.3390/ijerph19105959.
Lanera, Corrado, Giulia Lorenzoni, Elisa Barbieri, Gianluca Piras, Arjun Magge, Davy Weissenbacher, Daniele Donà, et al. 2024. “Monitoring the Epidemiology of Otitis Using Free-Text Pediatric Medical Notes: A Deep Learning Approach.” Journal of Personalized Medicine 14 (1): 28. https://doi.org/10.3390/jpm14010028.
Lanera, Corrado, Clara Minto, Abhinav Sharma, Dario Gregori, Paola Berchialla, and Ileana Baldi. 2018. “Extending PubMed Searches to ClinicalTrials.gov Through a Machine Learning Approach for Systematic Reviews.” Journal of Clinical Epidemiology 103 (November): 22–30. https://doi.org/10.1016/j.jclinepi.2018.06.015.
Lorenzoni, Giulia, Dario Gregori, Silvia Bressan, Honoria Ocagli, Danila Azzolina, Liviana Da Dalt, and Paola Berchialla. 2024. “Use of a Large Language Model to Identify and Classify Injuries With Free-Text Emergency Department Data.” JAMA Network Open 7 (5): e2413208. https://doi.org/10.1001/jamanetworkopen.2024.13208.
OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, et al. 2024. GPT-4 Technical Report.” arXiv. https://doi.org/10.48550/arXiv.2303.08774.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. “Attention Is All You Need.” arXiv. https://doi.org/10.48550/arXiv.1706.03762.

Thank you for the attention